Hierarchical Group-Based Sampling

نویسندگان

  • Rainer Gemulla
  • Henrike Berthold
  • Wolfgang Lehner
چکیده

Approximate query processing is an adequate technique to reduce response times and system load in cases where approximate results suffice. In database literature, sampling has been proposed to evaluate queries approximately by using only a subset of the original data. Unfortunately, most of these methods consider either only certain problems arising due to the use of samples in databases (e.g. data skew) or only join operations involving multiple relations. We describe how well-known sampling techniques dealing with group-by operations can be combined with foreign-key joins such that the join is computed after the generation of the sample. In detail, we show how senate sampling and small group sampling can be combined efficiently with the idea of join synopses. Additionally, we introduce different algorithms which maintain the sample if the underlying data changes. Finally, we prove the superiority of our method to the naive approach in an extensive set of experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Group Compromise Ranking Methodology Based on Euclidean–Hausdorff Distance Measure Under Uncertainty: An Application to Facility Location Selection Problem

Proposing a hierarchical group compromise method can be regarded as a one of major multi-attributes decision-making tool that can be introduced to rank the possible alternatives among conflict criteria. Decision makers’ (DMs’) judgments are considered as imprecise or fuzzy in complex and hesitant situations. In the group decision making, an aggregation of DMs’ judgments and fuzzy group compromi...

متن کامل

Models of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model

Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...

متن کامل

Solving New Product Selection Problem by a New Hierarchical Group Decision-making Approach with Hesitant Fuzzy Setting

Selecting the most suitable alternative under uncertainty is considered as a critical decision-making problem that affects the success of organizations. In the selection process, there are a number of assessment criteria, considered by a group of decision makers, which often could be established in a multi-level hierarchy structure. The aim of this paper is to introduce a new hierarchical multi...

متن کامل

Anomaly Detection in Hierarchical Data Streams under Unknown Models

We consider the problem of detecting a few targets among a large number of hierarchical data streams. The data streams are modeled as random processes with unknown and potentially heavy-tailed distributions. The objective is an active inference strategy that determines, sequentially, which data stream to collect samples from in order to minimize the sample complexity under a reliability constra...

متن کامل

DENDIS: A new density-based sampling for clustering algorithm

To deal with large datasets, sampling can be used as a preprocessing step for clustering. In this paper, an hybrid sampling algorithm is proposed. It is density-based while managing distance concepts to ensure space coverage and fit cluster shapes. At each step a new item is added to the sample: it is chosen as the furthest from the representative in the most important group. A constraint on th...

متن کامل

Group Analysis of Resting-State fMRI by Hierarchical Markov Random Fields

Identifying functional networks from resting-state functional MRI is a challenging task, especially for multiple subjects. Most current studies estimate the networks in a sequential approach, i.e., they identify each individual subject's network independently to other subjects, and then estimate the group network from the subjects networks. This one-way flow of information prevents one subject'...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005